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Unaspnated 
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Chapter 1 


Introduction and Motivation 


Lineal Piediction has been a populai tool foi speech piocessing since its 
mlioduction to tlie field There die seveial advantages and disadvantages 
associated with this method as a speech processing tool [1] Still because of 
its acccmacy and simplicity it is one of the most populai techniques 

Among vanous lepiesentations of the lesults of linear prediction analy 
sis, lefiection coefficients (PARCOR coefficients) aie more intei esting This 
is because of some piopeities of the lattice filter associated with it eg 
oithogonality, low coefficient sensitivity etc Above all, since the lefiection 
coefficient is a normalized quantity, its magnitude is always bounded [2] It 
is empirically established that foi telephonic quality speech, consideration of 
fust ten reflection coefficient is sufficient [3,4] 

Reflection coefficients aie lelated to a numbei of other sets of parameters 
[1] Examples of such parameters are the log aiea ratio and the area function 
A vocal tiact can be modelled as a senes of cylindrical waveguides com 
posecl of the same length but different diameteis i th log aiea latio is nothing 
but the natural logarithm of latio of area of (i + l) st cylindrical section to 
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that of i tl1 cylmchical section i t)l aiea function is (p — z) t/l area measured in 
teims of glottis aiea (yip) noinnhsed to unity 


AlO A9 A8 A7 A6 A5 A4 A3 A2 A1 AO 


C lottis End 


Lip End 


A* Ax Ax Ax Ax Ax Ax Ax Ax Ax Ax 


Tiguie 1 1 Wave Guide Model of Vocal Tiact 

Theie is an isomorphism between the i ih reflection coefficient and i ik log 
area latio That results into a recuisive relation between aiea parameters 
and reflection coefficients This is the fiist point of motivation to tins woik 
Let us consider phonetics foi a while Most of the Indian languages show 
a legulai aiticulatory phonetic pattern in their alphabet(s) Evety non vowel 
chaiactei represents a CV utterance with a fix V /&/ The first 25 non vowel 
chaiacteis are classified in a startlingly systematic aiticulatoiy phonetic order 
[5,6,7] The classification according to modem phonology [8] is given m table 
1 1 
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Place of 
Aiticultion 

Mannei of Articulation 

Unvoiced 

Voiced 

Nasal 

Unaspnated 

Aspnated 

Unaspnated 

Aspnated 

Velar 

(^ /ks/ 

(4 MV 

» /w 

(0 IM 

(?) W 

Alveolai 

(*)AJV 

© /t/V 

(*) /dga/ 

(«) /<**V 

(*) /j»/ 

Retioflex 

(?) M 

(5) /‘V 

(5) W 

(5) / A/ 

H /v?/ 

Dental 

(a) M 

w W 

(5) /*/ 

<9 W 

(*) M 

Bilabial 

fa) /W 

(») /?w 

« M 

w/iv 

W / m V 


Table 1 1 Fust 25 CV clusteis of Hindi alphabet 


Table 1 1 will liencefoith be lefened to as Alphabet matrix Each iow 
of the alphabet matux conesponds to a single place of aiticulation With 
this and the lecuision stated above, one would expect to see some sraulaiity 
m the tiajectones of aiea functions iow wise (The alveolar non nasals aie 
affricates, all othei non nasals aie stops ) 

The comparison of analysis thiough the waveguide model and the ai 
ticulatoiy phonetic behaviour of a V CV clustei becomes moie interesting 
because of the following facts 

• The waveguide model assumes constant vocal tube diarneteis ovei a 
given analysis fiame The classification of consonants in ai ticulatoiy 
phonetics is based on various constnctions in the vocal tract This 
involves dynamic behaviour of the vocal tiact 

• In the waveguide model, all the sections of the vocal tract are assumed 
to have identical length The consonants on the othei hand, aie classi 
(led as pei constrictions at places nonunifoimally distnbuted ovei the 
vocal tiact 
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Alternately stated, compai ison ol reflection coeflicient trajectories should 
reflect the validity as well as some possible shortcomings of the cylindrical 
waveguide model 

It is with this consideration in mmd that an attempt is made in this in 
vestigation to chaiactenze the sounds couespondmg to Devanagan alphabet 
(Vyanjana section) using LPC model and study its usefulness foi text to 
speech conversion foi Indian languages Another strongei point lor motiva 
tion is the availability of haidwaie products dedicated to lineai prediction 
analysis [9] 

To study the tiajectones of the paiameleis the lecoiding is earned out 
using ‘Speech Interface Unit 5 [10] Moie details of the lecoiding can be found 
in Appendix A 

As a part of the woik a software is developed which is briefly desenbed 
below 

1 Analysis section This section denves the gain and reflection coefficient 
s thiough Duibm’s algonthm [1] and the pitch thiough SIFT algonthm 
[ 11 ] 

2 Synthesis section This section utilizes a two multiplier lattice stiuc 
tuie Its mam puipose is to see the validity of the results of the analysis 
section 

3 Con version section This section consists of various programmes writ 
ten m C to conveit the data foimats mto ones lequned by vanous 
supporting softwaies available in=khe=nampm 
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4 Conti ol section This section consists of a standard K shell UNIX 
makefile, a piogiamme wntten in C which takes in an argument and 
executes vanous piogiammes by supplying them the argument with ap 
piopnate augmentations and attnbutes and anothei C file containing 
a function which teimmates the calling progiamme if the file pointer 
passed to it is NULL 

The contiol section may seem tuvial at piesent but will be extiemely 
useful foi oigamzation of database while developing full scale text to speech 
conversion system In oidei to maintain the portability oi the C code ANSI 
C standaid is followed thioughout 

The oigamzation ot thesis is as follows 

• Chaptei 2 discusses oigamzation of Hmch/Sankrt alphabet according 
to the modem phonology 

• Chaptei 3 discusses some aspects of lineal prediction, synthesis model, 
SIFT algonthm for pitch determination and log area ratios 

• Chaptei 4 oveiviews the softwaie developed 

• Chapter 5 discusses the tesults 

• Chaptei 6 coveis conclusions and suggestions for future work 

• Appendix A gives details about the recording of database 



Chapter 2 

Organization of Hindi 
Alphabet 


Like most of the othei Indian languages, Hindi too has an alphabet of plio 
nelic natuie It has two distinct paits vowels, dip tliongs and some tnlls 
(4cR) /svaia/ and CV ciusteis with a fixed V /a/ The lattei section is called 
(cdfb-l'td ) /vjajid-aj a ns/ For convenience, this section will be called CV 
cluster s 

Puie consonants aie represented by a diacritical mark called (fieri >nri) 
/helantd/ CCV and CCCV ciusteis are represented by joined oithogiaphical 
representation [5,6] 

2 1 Vowels, Dipt hongs and Some Trills 

This section has membeis in the following older ( 3T ) /e/, ( 3dT ) /a/ 

( ? )N,(i )A/,(3 )A-/>(^)M,(*)A/.(«)A/,(<3)A7, 

PJ )/e/,(% )/ai/, (at)/o/,(OT)/W. ( 3T ) /S/ “d ( ST ) /“V 


6 
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2 2 CV Clusters with a fix V /a/ 

Hindi is ifHut.nl m coiibon uils Hindi ib i Sanskit based Unguigc Sanskit 
giammanans lcjected pobsibility of utterance oi consonanlb without a vowel 
following oi pieceding it [o,6] So Sankrt anti hence Hindi have chaiactcrs 
representing CV utteiences Poi a constant C, such a collection is called (4cR3flc31) 
/sv*iamala/ and is considered to be a demative of the alphabet which is 
nothing but the eollection of first letter of /sv»ramala/ So wc get a constant 
vowel /a/ in Hindi alphabet 

• Toi hist 25 characters we get a penodic pattern of UV UA, UV A, 

V UA, V A and nasal CV clusters Columnwise they are vclai stops, 
dveo palatal affricates, retroflex stops, dental stops and bilabial stops 
(except foi nasals of course, winch uc u son nils) itsjnitmh [libh 
11 ] 

• i he othei section contains the following CV clusters in the lollowing 

or dei (£J ) /_]$/ palatal glide, ( ) /ia/ retroflex till! (of ) /va/ 

alveolar lateial, { c*5 ) /te/ dental glide, UV fricatives of p ilatal 
letroflex and dental aitieulatory positions (<$T ) //a/, ( ^ ) /a*/ and 
( ^ ) /s-a/, and ( <5 ) /W V \<lar fncitive 

2 3 Further Notes on Orthography and Pho- 
netics of Hindi 

In spite of this vast collection of consonants, this covers neither full Devna 
gan alphabet not a complete set of Hindi phonetics To tal e e no of Pei sum 
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and English influences and dialectial requirements, Hindi has adopted var 
ious diacutical maiks (eg ‘ ’ (oT3vHJ /nuqta/ is used to lepresent ‘bad 
consonants) 

Still some othei established CV clusteis in one oi the other Indian Ian 
guages are not used in Hindi (eg ( o3 ) /le/) Hindi 01 any other Indian 
alphabet is not able to covei allophomc 01 co aiticulatoiy effects 

2 4 Choice of the Basis of Comparison 

In addition to the above facts and likely systematic behavioui of paiame 
teis, phonetically theie is not much discrepancy m the flist 25 CV clusteis 
among most of the Indian languages 

In oidei to see the vanation of paiameters ovei the alphabet matnx, a 10 
coefficient lattice structuie is chosen With this overview of Hindi alphabet 
and its oideimg, it seems important to see how the aformentioned model 
behaves with lespect to different CV clusteis 

Since the initial eirors m linear prediction, model are laige and the con 
sonant duration and eneigy m a CV clustei aie (geneially) less than that of 
vowel counterpart, it is not meaningful to choose CV utterance as the base 
of comp Ori isons This conjectuie was also checked piactically by synthesis 
mg on the developed softwaie and available setup A lot of ambiguity and 
distortion weie detected So the analysis was earned out on V CV clusteis 



Chapter 3 

The Linear Prediction 


The motivation behind choosing a lineai piediction model was mentioned m 
the ( haptei 1 Foi the sake of completion, this chaptei bnefly discusses its 
well known theoiy and the SIFT algorithm foi pitch determination 

3 1 Linear Estimation 

Geneial pioblem of lmeai estimation can be stated as follows 

We aie given p landom vauables (i v s) as data and we wish to 

find p constants {a }j!li such that if we estimate the i v s ( signal ) by the 
sum 


s = ]£a t X, 

«=i 

so that the m s value E{e 2 } is minimum where, 


e = s — s 



(3 1 ) 


e — s 
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(3 2 ) 
(3 3 ) 
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s becomes the homogeneous lineai MS estimate of the signal in terms of 
data This estimate is given by the conditional mean E {s | {XJ II*'} 

As pei piojection theoiem, E{e 2 } is minimum for {a,}'^ if the erroi e 
is oithogonal to the data 

£{eX*} =0 1 < i < p (3 4) 

foi ieal X t s, 


E {eX,} = 0 l<i<p (3 5) 

s - J2 c h x 3 j X, J = 0 1 <i<p (3 6) 

Setting i = 1,2, ,p, R v = £{X,Xj} and R 0 j = HfsX^} 
we get the solution 

R t) a, - R Q j 1 <) <p (3 7) 

1=1 

These aie the well known Yule Walker equations 

3 2 Linear Prediction of Discrete Time Se- 
ries in terms of Finite Past 

The i step piedictoi of a disciete landom piocess s[n] is the estimate of 
s[n + 7 ] in teims of s[n] and its past 



(3 8) 


s[n + j] = £{s[n + 7 ] | s[n — 1], l > 0} 
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foi 1 step piedictoi and stationary piocess s, 


s[n] = £{s[n] | s[n - k], k> 1} (3 9) 


oo 

s[n] = £ - *•] 

L = 1 


(3 10) 


Howevei for piactical consideration, we limit ourselves to at most p past 
values of s[n], 


s p [?i] = £{s[n] | s[n — A.], 1 < k < p) (3 11) 

s p [n] = Y, f 0 is[rc ~ *■] ( 3 12 ) 

so the foiwaid enoi becomes 

e p (n] = s[n] - s p (n] (3 13) 

fiom the projection theoiem, 

E{(s[n] - s„[n])s[n - k]} = 0, 1 < k < p (3 14) 

ancl thus we get the Yule Walkei equations in the fonn 

R[ l]a? + R[2]al + + R\p]a P v = W 

R[0]a v ,+R[2]al+ + R\j?y p = R[l) 

R[p — l]af + R\p — A a 2 + + -R[o]«p = R\p] 

wheie Ar> is the determinant of the coi relation matux Rp with the co 

r 

efficients of the last p equations We note keie that the elements on each 
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diagonal of the matux of R p aie identical 1 e , the matrix is Toephtz Hence 
we can apply numeious lecuisive algonthms to solve for 

3 3 Durbin’s Algorithm 

This algouthm is one of the most efficient xecursive algorithm to solve the 
above pioblem Throughout the discussion in this section e[i) denotes euor 
at the % l h step of the lattice filter 
BEGIN 
e[0] = R[0], 

TOR i = 1 TO i = jiDO 
BEGIN 

aw = (*N - e;; 1 , «j[» - i]h(. -;]) /«[« - j], 

«.[*] = M*]i 

FOR j = 1 TO j = (* - 1) DO 
BEGIN 

OjM = «j[t - 1] - - !]> 

END, 

e[z) = (l-k^))e[z-ll 

END, 

END 

The final solution is given by 

4 = a p k , 1 <k< P) ( 315 ) 


and 
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£{e 2 } = e[p] 


(3 16) 


3 4 The Synthesis Model 


e M 




H(z) 


s[n] 




Figure 3 1 All Pole Filtei 

Let H{z ) be an all pole filtei with the tiansfei function 


m = 


a 


i - EL, «»*-* 

and s[n] be the output of the filtei fo, some input piocess e[n] 1 e , 


(3 17) 


P 

s[n] = Ge[n] + — A:] (3 18) 

k = i 

Ihe solution to the determination of {ar}£_i such that E{e 2 } is mini 
mized is discussed m previous sections Now comes the question of deteimi 
nation ot the natuie of e[n] 

In unvoiced sections, speech signal s[n] is noisehlce So, selection of a 
random noise geneiatoi is piopex as the source e[n] On the othei hand, m 
voiced sections, speech signal is almost periodic a damped sinusoid So one 
would use as excitation, a periodic impulse generator with the period being 




It 


the pitch penod In eithei of the cases, the m s value of the signal e[n] and 
the filtei coefficients 

Any IIR/I IR system with rational tiansfei function can be lepiescnted 
b) dnut nnpU irnnl ition loim 01 il s 1 iLLit c (qiiiviluit In otu < im tin dnut 
implementation coclhcicnts are lhc 1 ittiee coclhcicnt counteipul 

1S obtained through Durbin’s algouthm[l] We can utilize the IlR 
lattice stiuctuie with reflection coefficients, a gain block, a V/UV switch, a 
landorn number geiioiator and a pcnochc impulse i,nuiitoi (wilh (ontiol 
Kbit pc nod) to loim Liu synthesis model 



Figuie 3 2 The Synthesis Model 
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3 5 V /UV decision and Pitch Determination 
the SIFT algorithm 

Tlieie ue a numbei of algoi ltlims available fox both V/UV decision and 
pilch deleunination[l,2,12] SirT algorithm is selected bacause it is mod 
eiately accuiale and computationally shares some functions with the lineal 
piediction analysis 

Idea of SirT algontlim is to pass the given speech tlnough a low pass 
hltei to lemove lughei formants, then to pass the output thorgh low oidei 
inverse flllci, (continuning with the notations of eailier sectionspnveise filtei 
is the one which takes s [n] as the input and gives e[ra] os the output ) and 
autoconelate the output to deteimine whethei it is voiced 01 not[2,10] 



Pitch Period 


Figure 3 3 Block Diagram of SirT Algorithm 
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Details of the implementation may be found m Cliaptei 4 

3 6 Relations with Cylindrical Waveguide Model 

Though the theoiy behind the deuvation of the xelations between the reflec 
tion coefficients and the log area i alios is lathei involved, the assumptions 
behind such a modelling and the lelation by itself aie noteworthy [2] 

Assumptions 

1 The vocal tiact is assumed to consist of p mfcei connected sections of 
equal length Dacli individual section is of unifoim aiea 

2 The tiansveisc dimension of each section is small enough compaied 
with a wavelength so that the sound propagation thiough an individual 
section can be liealed as a pi me wave 

3 The sections ue ngid so that the internal losses due to wall vibration, 
vis cocity and heat conduction aie negligible 

4 Noimal assumptions leading to elementaxy wave propagation aie valid 

5 Tlie model is lineai and is uncoupled from glottis 

6 Lhe effects of the nasal tiact can be ignoied 
Relation 

Let A m be lhe cioss sectional aiea of m th section and g m be the m 1 log 
aiea iatio, then the lelation is given by 

<3j9) 
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3 7 Selecting Various Parameters for the Model 

1 S unplug R ite J S Since I PC biscd speech piocessmgis tut ml toi 
telephone <j u i)ity spit tit with pissbmd J00~5 r )U0 11/ !h< smiphug 
Ik qu< ney ol 8000 11/ is thoM.ii 

2 Otdei oi 1 lltu P empirically established value is 10 Horn votal 
tract model, the model should be able to memonze upto 2 Ljc seconds, 
whete / is tin avc i ige length of the vocal ti ut (sa 17cm) Mid c is tin 
velocity of sound (« McmS -1 ) rheiefoie necessuy memory becomes 
1 mb 1 oi 8 kHz sampling, it comes out to be 8 io be neaici to the 
st uidaids P is kept to be 10 

3 Analysis limit length N This should be smill enough so tint 
1 he voeal ti aet movement can be considered negligible (« 25mS ) This 
eomes out to be 200 Toi liansiont sounds, smallei mteival is desuable 

4 Windowing In order to avoid spectial abruptness because ol the dif 
Je reiice between s(0] and s[n~X] windowing is desuable So a Hamming 
window is applied 

5 Synthesis name Length fiamelen In older to maintain smooth 
in hh in Hpetili i< ptotlue lion, 2/5 ol the analysis length is taken lu bi 
the synthesis fiamclenglh i e , theie is an oveilap of 1/3 flame for the 
in ilysis 



Chapter 4 

Software Overview 


As discussed in mtiocluclion, the softwaie developed can be divided into foui 
sections (i) Analysis section (n) Synthesis section (in) Gonveision section 
uid (iv) Conti ol section 

4 1 Analysis Section 

Thcie is a piogiamme Ipcan c which takes in a bmaiy stung file of 16 bit 
signed mtegei speech data and gives a binaiy stung file of floating point 
values of gam, pitch penod and A[0] to &[10] the zeflection coefficients 
fiamc by fiame (Of couise k[ 0] is a useless quantity and is taken care by the 
synthesis section ) Impoitant sections of the algouthm is discussed below 
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Piogiam lpcan c 
^include “lpch”, 

^include “lpcan h”, 
anay h(N],fa[6] of double, 

BEGIN 

integer n, liame, 

Assign default values to the file pomteis, 

Modily file poitei values considering the command line, (* ‘asgnargv’ + ) 

Check the value of file pointers, (* ‘check Null’ *) 

Calculate the Hamming window h[], LPr coefficients ta[], (* globals *) 
n = 0, C 1 sample index *) 
frame = 1, ( + flame index *) 

REPLAT 

BEGIN 

ir (NOT (EOF(mput file))) THEN 
BEGIN 

Read an integer from input file, (* speech sample *) 

Pass the sample through 900 Hz LPF, (* ‘lpfl '*’) 

Inclement the sample index, 

END, 

IF ((n = N) OR (EOF(input file))) THEN 
BEGIN 

Calculate gam and reflection coefficients, (* ‘calcCoeff’ ' t! ) 
Calculate pitchperrod, (* ‘calcPitch’ *) 

Note down the coefficients m the output file, 
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IF (veibose mode) THEN display frame # and parameters, 
Shift the last | pait of fiame as the first | pait and cleai 
the lest of the fiame, 

Modify the sample index and frame index, 

END, 

END, 

UNTIL (EOE(input file)), 

C lose the files, 

END 


function ‘asgnaigv checks foi the arguments “ sP , “ pf” and “ v” in 
the command line and rssigns the argument next to them to input file 
pointei, output lilepomtei and veibose flag The operation is earned out m a 
‘WHILE DO’ loop and hence no oidei of argument specification is imposed 
If theie is some mistake in the command line, it terminates the programme 
displaying ‘usage’ message at the file ‘stden’ 

function ‘checkNull c’ is the most lefened function It takes in a file 
pomtei and a stiing In case the file pointei points to NULL, it teimmates 
the calling piogtamme and displays the string in the file ‘stderr’ 

lunction ‘lpfl’ is i second oidei LPF with cut off frequency coiresponding 
to 900 II/ in analog domain 

lunction calcCoelf(s[N] an ay of short mtegeis, h[N], VAR k[P+l] array 
of float, VAR G float) 

( * s[ ] is the speech frame, h[ ] the Hamming window, k[ ] reflection coeff 
anay and G is the gam of the model *) 



w[N], i[P+l] may of float, 

C w l ] lb tllc ‘II immcd’ speech frame, i[ ] is auto coi relation anay +), 
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BECIN 

wmdow(s,h,w,N), C window the speech *) 
avciage(w,N), ( i make the nveiage of w[ ] zero *) 

(* The Lineal Piediction lesults aie foi ‘homogeneous’ estimates *) 
auto( onel(w,N,P,i), (*” calculate fiist P+1 autoconelations *) 
diubm(i,P,k,G,P), ( K Dui bin’s algorithm *) 

END 


iunc tion ‘calcPitch’ is the implementation of the SI1 T algorithm dis 
cussed in Ghaplci 3 Aftoi the decimation, the aveiage of the signal is made 
zuo flu inteipohtion is paiabohc with the values adjecent to the max 
lmum of auto con el all on taken into account The UV/V decision is taken 
by p issing the in lei pointed maximum value ‘ival’ alongwith the interpolated 
position ‘x 1 to the function ‘decision’ 

function decision(ival, x, VAR pel float), 

static vuv mtegei, (“ flag to lefei to the decision about the pievious 
fi une + ) 

BEGIN 

IF ((ivil > 0 4) OR ((ival > 0 3) AND (vuv = 3)), 

BEGIN 

vuv = (vuv & 1) “ 2 + 1, (* this fiame is voiced +) 
pei = x, (* leturn the pitchperiod *) 


UtNiftAL Li i *K 

I 1 T K AHPUft — 

— — - ""i 

Ig®, # 9 » wjmmltr 
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END 

EISE 

BEGIN 

vuv = (vuv & l) * 2, (* tins liarae is unvoiced + ) 
pci = 0 0, (* letuin the pilchpenod showing this *) 

END, 

END 

4 2 Synthesis Section 

Piogmm lpcsyn c 
^include “Ipc h”, 

^include “lpcsyn h”, 

BEGIN 
intcgei flame, 

Assign default vilues to the file pomteis, 

Modify file pointei values con&idenng the command line, (* ‘asgnaigv’ *) 
Check the value of file poinlcis, ('* ‘check Null’ ‘‘) 
ii une = J , 

Alloc ate mtmoiy to the synthesis buffei, 
ir (no allocation of memoiy) THEN 
BEGIN 

Display enoi message, 

Exit, 
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END, 

RLPl'AJ 

HI- GIN 

lie id in the funic of the parameters, 

II* ( verbose mode ) THEN display fiame $ and parameters, 
Synthesize the speech using the paiameteis, (* ‘synthesize’ *) 
IT (fi unc ^ 1) THEN stoic the speech, 

Incicmcnt fi ame index, 

I Nl), 

UNI H (I 01 (input file)), 

Closi ill lh( Hits, 

1 1 C( lli< memoty occupied by the synthesis buffer, 

END 


function synthcsi/e (ft une integei, gain, k[P+l], VAR y[fiamelen] 
float,) 

BEGIN 

IF (li unc = i) riILN 
BEGIN 

Imhali/c all the ‘pievious’ pinmeteis by piescnt paiameteis, 
Return, 

END 

ELSE 


BEGIN 
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Sc I pic sc ni pai mu Le is into local static vanables as new paiameteis, 
II 1 (both the pic vious vnd the piesent flames die voiced) THEN 
sc t slope s loi ill tlu pai uncteis foi hneai inteipolation, 

J I SI' set ill slopes to zuo, 

I OR i = 0 10 l = frame len DO 
BEGIN 

SynUiesi/t using two multipliei lattice model and piesent 
p name teiSj 

Upclilt j>i(scnt pu imetas using lineal inteipolation, 

1 Nl), 

Sc I old p u um It is icpi i\ to new paiatneteis, 

1 NI) 

4 3 Conversion Section 

1 his sc c lion consists of vanous cxc cut able files which conveit data fiom one 
foimvt to Iht othc i Hus section is suppoit dependant This section should 
modified as uid when icquncci llus section also includes executable file 
to convcil the leflecfion coeflicionts into log aiea latios All the executable 
files h tve Ihui souicc code files with an extension “ c* The summary of 
pi ogn mines is is below 

• voicish c eonvc 1 1& “Spc cch and Voice Systems” format file into binary 
stung ol signed shotl mtegeis 
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• us ( ronvuls tin binuy slung file of shoil mtegeis into ‘gnuplot’ 
totupalibk ASCII lil< 

• log < (nuts lh< it Ik < lion cotfliucnls torn thepai imetei files and con 
vc i Is I lu m into log «ru alios l he output file is 'gnuplot’ compatible 

• spth t cuts tlu file poilion intci actively asking fiom wlieie the speech 
st ills uicl wilt ic it t lids 

• sliivoit t tonvtils tlu binuy slung file of slioit mlcgeLs into the 
“Spt t < h uul Vou t Systt ms” d it i founat file 

• ittimt < tonvtils 2’s (otnplimtnl swapped bytes binaiy stung files 
of sluul inltgtis into binuy stung of signed shoil integeis 

« ut 1 1 finds (lit it flection cotfliucnls fiom the paiametei file, cal 
t ulalts uta p<u un<lt is and nukts ‘gnuplot’ compatible files of axea 
Jam lion is a hint lion of turn (2 Dimension plot) anti the oveiall vocal 
U u t shape is i function of time (3 Dimension plot) 

4 4 Control Section 

I lus St t turn ( (insists of a st uicl ucl K shell UNIX makefile, a piogzamme wnt 
ten in 0 which lal«s in an ugument and executes vauous piogiammes by 
supplying tlu ni tlu aigument with appiopu itc augmentations and attubutes 
uicl vnotlui (' file tout lining a function which teunmates the calling pio 

gtamnn il I lit hit pointet passed to it is NULL 

1 lus section is uselul lo maintain unifoimity of file naming and to teduce 
(he processing cfloit Just by entenng ‘piocess’ and the sound file name 
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(with ‘tilt* (. \U nsiou by ckluilt) wt gel piopci sequential execution of ill 
(lu ptogi unmism tlu s<< lions discussed ibove This is accomplished with 
i (txc <1 I islnon ol itiftiiu illation of the sound file name foi the name of file 
gcnuifcd it i p u tic ill u st ig< ofpioussmg 

4 5 Flexibility Aspect 

In (lu nn]ot sution ol uialysis and synthesis the external functions are 
divulul into two ( lilts ‘common i’ uid ‘tool c’ ‘common c’ contains ai 
gum< lit issigiimt ill ) lilt ptmilt i null cht clang and usage functions ‘tool c’ 
tout mis gtiut il I muttons ol digit il signil pioccssing like iveiage filtei , foi 
wutl and mvtist lillut lillti, second oidu LPI , decimatoi, inteipolator, 
ui lot oi it Ulton, Dmbin’s dgoiithm etc 

No Unit lion ol ‘tool c’ talus any global vauable foi gianled So these 
luiH t ions t an lx uld< tl to Hit hbi uy of the geneial DSP functions and save 
i lot ol dt vtlopmtnl limt Hit development of such a small but useful 
toolbox was ntttssdiy lot smooth implementation of the woik 

AIL analysis uul/oi synthesis p u uuctcis (like fiamelengths, sampling 
i alt, d< Unit hit names, maximum ind minimum values of pitch expected 
d< ) lit di (liu (I in the lit idol flits So study of vitiation m vauous param 
< l< is t ui Ik li uitlh (I with a minimal t (foil 



Chapter 5 

Discussion of Results 


Will) 2 r ) V ( V < lush is m uiiisidu ilum and 10 ucu functions foi each of 
i ht m, llu u in d ginphu d u pit s< illation becomes volumcuous Ileie a try 
is made lo sketch (he mipoiluil u suits in buef 

5 1 The Basis of Presentation 

« On i niU ic si is in the < slim itr of vatialion of atca functions with re 
spcit lo time loi i given consonant ~ a li insiloiy phenomenon The 
m i flute lions <uc plot led lineally mtoipol ited instead of chsciete area 
hint turns is plotted in vowel uulysis 

• Sim < llic inli u sl is lo sc i till K fl< chon ol ai lit ulaloiy position function 
as 1 1 10 I In [iut< I ion ol Linn , the lun< I ion of ait lculaloiy position and 
(III i (million tility loi the s vine pluc ol aiticuHlion, the dimension of the 
giapli he ( oini s umuanagabli ov< i a papei A solution to tins is to see 
the aic a. junction of ml (test going columnwise in alphabet matrix Toi 
the sake of huvity, one icpie.se illative column pei the aiea parameter 
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ol inlutsl is shown 

• 1 Ik au t functions of inteicsl tie those conespondmg to articulatoiy 
positions Such area functions (A 0 at lips, A l0 at glottis, the lattei 
uoimah/ed to unity) aie as lollows 


Aiticulation 

Aiea Function 
of Inteiest 

Vein 

AfijAg 

Alveolai and Ret 10 (lex 

^3 ,A 

I)( nl d 

AnA 

IJihibi il 

Ai ,/1q 


I ahl< S 1 PI ice ol Aiticuhtion ancl Aiea Functions of Inteiest 


5 2 Conventions followed in Graphs 

• llu X axis Ins two liblcs type of utteiance and the aiea function 
lh( type ol uttci nice is UVUA Unvoiced, unaspn ated, UVA Un 
voiced, aspuatccl, VUA Voiced, Unaspuated, VA Voiced, Aspnated 
oi Nasals The aiea function A is lepxescnted as Ai 

• J he Y ixis shows the value ol the aiea function with A 10 noimalized 
to unity l Ik plot is ltneaily mteipolated 

• The cui ve fot A t involving apaiticulai consonant 0 is named ‘stung AP, 
while ‘stung’ is a stung dependant on C Table 5 2 shows the stung 
equivalent of each membei ol the alphabet matrix The alveolar non 
nasals aie afFu cates, all olhei non nasals aie stops 
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PI IU ol 
Aiticultion 

Matmei of Aiticulation 

Unvoiced ! 

Voiced 

Nasal 

Un aspirated 

Aspnated 

Unaspnated 

Aspirated 

Velai 

ak 

akh 

ag 

agh 

ang 

Alveolai 

ach 

achh 

aj 

ajh 

anj 

Ketiollex 

at 

athli 

ad 

adlrlr 

aim 

Dent al 

alden 

ath 

adden 

adh 

an 

Bilabial 

ap 

aph 

ab 

abb 

am 


Table r ) 2 thrive Piefixes Involving Vanous Consonants 

5 3 Observations 

l Ik unpox l ant observations fiom the lesults aie summanzed below 


1 Fox a susl amcd vowel, ilie aiea functions show oscillatory behaviour 

[rig 5 1] 

2 TL he silenrt thicshold detection woiks well with almost all the utlei 
uic es except some V CV utteiances with voiced aspirated consonant 

1 It is easy to distinguish between the vowel part and consonant/silence 
p art fiom the giaplis Vowel show moie vocal opening than the conso 
nanl counterpart Throughout the vocal tract, the consonant sections 
show (onskiclion 

4 In <as< of unaspualecl consonants, voiced and unvoiced consonants 
show almost similar tiajectones 

5 Going column wise ui the alphabet matrix, it is not possible to detect 
any particular constriction for particular place of articulation 
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UVUA A6 

Pmamelei A 5 foi Unvoiced Unaspnated Stops 
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UVA A6 


] iguie 5 4 Paiimelei Aq fox Unvoiced Aspirated Stops 








Tiguic 5 6 Pnamelci A 3 foi Nasals 
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2o0 | i 1 r , , , 

akh A1 

achh A1 

200 “ athhAl 

ath A1 
aph A1 

150 

iOO 

50 

0 

0 10 20 30 40 50 60 

UVA A1 

Figirn 5 8 Pnamelei A\ foi Unvoiced Aspuatecl Stops 


akh A1 
achh A1 
athh A1 
ath A1 
aph A1 
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5 4 Interpretations 

fliese lcsults make tilt following points cleai 

1 J lie aica functions should be calculated with pitch synchronous anal 
ysis Since the method adopted was not pitch synchionous, we get 
oscillations m sustained vowel pait 

2 The behavioui of two classes of consonants the voiced aspnated stops 
uid the bilabial stops is icmaikable and seenyto be less tiactable using 
tilt pus<nt model 

1 1 Ik overall conslnction ol vocal tiact is detectable but not the ‘stop’ 

phenomenon which makes the stops ‘stop’s Thus oui piemise of getting 
sumluity in behavioui iow wise and diffeience m behavioui column 
wist fails Hie possible cause behind this may be the avei aging out in 
space and time Details of this mteipietation aie discussed in Ghaptei 6 



Chapter 6 

Conclusion and Suggestions for 
Future Work 

6 1 Conclusions 

As is evident from l lie lesults, the waveguide model fails to detect the full 
consluclion I Ins may be because of the following causes 

• The full coiistuction is too tiansitoiy The detection becomes more 
difficult if the fr ime ol analysis divides the coiistuction interval into 
two puts 

• Hie consliirlioii is not limited to a paiticulai section of vocal tract 
but the tongue movement constucts the whole vocal tiact foi the time 
being Llius averaging takes place not only m the time but across the 
ai< a paiamctois also 

• The full consliiolion does not constrict one whole section of the vocal 
li ict in most of the cases Thus the phenomenon gets aveiaged out 
ovei the whole section 
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6 2 Suggestion for Future Work 

• In oi dc l to be nioi < confident about the vowel put a pitch synchronous 
an ilysis should he c lined out This will help m co ailiculation studies 
of uUuanns 

• Pile piohlem associated with a tiansitoiy phenomenon can be alleviated 
by choosing smallei fiamc of analysis The accuracy of the analysis can 
bo maintained by inci eased late of sampling But such a analysis will 
be useless foi upioduction be < mse the synthesis model assumes a set 
oi gun, pit <li and u flection coefficients fed at a lime Smallei mteival 
ol in i lysis sh ill b vu 01101 m pitch detection The simplest change m 
tlu mode I to cue mnvent the pioblem is to have multiple gam leftection 
coefluKiit unlysis ovei the pitch penod and thus seveial changes in 
voi il liaet to be duven by single excitation 

• L lie pioblem of spatial distribution of constnction can be tal led by m 
ck ising the numbei of cyhnclucal sections thus increasing the numbei 
ol pole s m the filtei The othei possible appioach is to assume a waveg 
tiiclc with sections nonunifoim m length In eithei ol the cases, we lose 
the simplicity ol tlic model however 

• J bus oui model may not he suitable for text to speech conveision sys 
tun ioi ah oiihogiaphy based on aiticulatoiy phonetics This pioblem 
m ly be ovotcome by mcicsed numbei of lules leading to such an im 
pic ment it ion 



Appendix A 


The lecoidmg was earned out on a female and a male speakei in foim of 
isolated V GV utteiauccs Ihe caie was taken not to have any contextuil 
me vmng ol uiy utta nice lhe lecoidmg was monophonic 

1 lie sampling frequency was set to 8000 Hz The Jlltei cutofF frequency 
was sc t to be 3500 IIz The donation of lecoidmg was 2 S wheiem the silence 
was ubiti uily lccoiclecl picceding and following the utteiance 

As pci the specifications supplied by the manufactuiei , the filtenng is 
done by eighth oidei piogiammable active Butteiwoith lowpass filters They 
hive i loll off of 48 dB/octove [7] 
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